Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers

نویسندگان

  • Mathieu Faverge
  • Julien Herrmann
  • Julien Langou
  • Bradley R. Lowery
  • Yves Robert
  • Jack J. Dongarra
چکیده

This paper introduces hybrid LU–QR algorithms for solving dense linear systems of the form Ax = b. Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and QR elimination steps based upon some robustness criterion. LU elimination steps can be very efficiently parallelized, and are twice as cheap in terms of floating-point operations, as QR steps. However, LU steps are not necessarily stable, while QR steps are always stable. The hybrid algorithms execute a QR stepwhen a robustness criterion detects some risk for instability, and they execute an LU step otherwise. The choice between LU andQR stepsmust have a small computational overhead andmust provide a satisfactory level of stability with as fewQR steps as possible. In this paper, we introduce several robustness criteria andwe establish upper bounds on the growth factor of the norm of the updated matrix incurred by each of these criteria. In addition, we describe the implementation of the hybrid algorithms through an extension of the PaRSEC software to allow for dynamic choices during execution. Finally, we analyze both stability and performance results compared to state-of-the-art linear solvers on parallel distributed multicore platforms. A comprehensive set of experiments shows that hybrid LU–QR algorithmsprovide a continuous range of trade-offs between stability and performances. © 2015 Published by Elsevier Inc.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Predictions of Multilevel Communication Optimal LU and QR Factorizations on Hierarchical Platforms

In this paper we study the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We note that we focus on multilevel QR factorization, and give a brief description of the multilevel LU factorization. We first introduce a performance model called Hierarchical Cluster Platform (Hcp), encapsulating the characteristics ...

متن کامل

PLAPACK: High Performance through High-Level Abstraction

Coding parallel algorithms is generally regarded as a formidable task. To make this task manageable in the arena of linear algebra algorithms, we have developed the Parallel Linear Algebra Package (PLA-PACK), an infrastructure for coding such algorithms at a high level of abstraction. It is often believed that by raising the level of abstraction in this fashion, performance is sacriiced. Throug...

متن کامل

A collection of parallel linear equations routines for the Denelcor HEP

This paper describes the implementation and performance results for a few standard linear algebra routines on the Denelcor HEP computer. The algorithms used here are based on high-level modules that facilitate portability and perform efficiently in a xvide range of environments:The modules are chosen to be of a large enough computational granularity so that reasonably optimum performance may be...

متن کامل

Blendenpik: Supercharging LAPACK's Least-Squares Solver

Several innovative random-sampling and random-mixing techniques for solving problems in linear algebra have been proposed in the last decade, but they have not yet made a significant impact on numerical linear algebra. We show that by using a high-quality implementation of one of these techniques, we obtain a solver that performs extremely well in the traditional yardsticks of numerical linear ...

متن کامل

Towards a multifrontal QR factorization for heterogeneous architectures over runtime systems

During the last decade, computer architectures for high performance computing have considerably evolved toward heterogeneous systems equipped with different types of computational units and a higher number of cores per chips. An example of popular heterogeneous architectures widely adopted in the high performance computing domain are GPU-based systems. In the work presented in this talk we stud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 85  شماره 

صفحات  -

تاریخ انتشار 2015